Imitation Learning with a Value-Based Prior
نویسندگان
چکیده
The goal of imitation learning is for an apprentice to learn how to behave in a stochastic environment by observing a mentor demonstrating the correct behavior. Accurate prior knowledge about the correct behavior can reduce the need for demonstrations from the mentor. We present a novel approach to encoding prior knowledge about the correct behavior, where we assume that this prior knowledge takes the form of a Markov Decision Process (MDP) that is used by the apprentice as a rough and imperfect model of the mentor’s behavior. Specifically, taking a Bayesian approach, we treat the value of a policy in this modelingMDP as the log prior probability of the policy. In other words, we assume a priori that the mentor’s behavior is likely to be a highvalue policy in the modeling MDP, though quite possibly different from the optimal policy. We describe an efficient algorithm that, given a modelingMDP and a set of demonstrations by a mentor, provably converges to a stationary point of the log posterior of the mentor’s policy, where the posterior is computed with respect to the “valuebased” prior. We also present empirical evidence that this prior does in fact speed learning of the mentor’s policy, and is an improvement in our experiments over similar previous methods.
منابع مشابه
Robots Learn to Recognize Individuals from Imitative Encounters with People and Avatars.
Prior to language, human infants are prolific imitators. Developmental science grounds infant imitation in the neural coding of actions, and highlights the use of imitation for learning from and about people. Here, we used computational modeling and a robot implementation to explore the functional value of action imitation. We report 3 experiments using a mutual imitation task between robots, a...
متن کاملA Bayesian Model of Imitation in Infants and Robots
Learning through imitation is a powerful and versatile method for acquiring new behaviors. In humans, a wide range of behaviors, from styles of social interaction to tool use, are passed from one generation to another through imitative learning. Although imitation evolved through Darwinian means, it achieves Lamarckian ends: it is a mechanism for the inheritance of acquired characteristics. Unl...
متن کاملObserved Body Clustering for Imitation Based on Value System
In order to develop skills, actions, and behavior in a human symbiotic environment, a robot must learn something from behavior observation of predecessors or humans. Recently, robotic imitation methods based on many approaches have been proposed. We have proposed reinforcement learning based approaches for the imitation and investigated them under an assumption that an observer recognizes the b...
متن کاملA Bayesian Approach to Imitation in Reinforcement Learning
In multiagent environments, forms of social learning such as teaching and imitation have been shown to aid the transfer of knowledge from experts to learners in reinforcement learning (RL). We recast the problem of imitation in a Bayesian framework. Our Bayesian imitation model allows a learner to smoothly pool prior knowledge, data obtained through interaction with the environment, and informa...
متن کاملImitation Learning in Relational Domains: A Functional-Gradient Boosting Approach
Imitation learning refers to the problem of learning how to behave by observing a teacher in action. We consider imitation learning in relational domains, in which there is a varying number of objects and relations among them. In prior work, simple relational policies are learned by viewing imitation learning as supervised learning of a function from states to actions. For propositional worlds,...
متن کامل